Hierarchical multinomial models 1 Running head: HIERARCHICAL MULTINOMIAL MODELS Hierarchical Multinomial Processing Tree Models: A Latent-Class Approach
نویسنده
چکیده
Multinomial processing tree models are widely used in many areas of psychology. Their application relies on the assumption of parameter homogeneity, that is, on the assumption that participants do not differ in their parameter values. Tests for parameter homogeneity are proposed that can be routinely used as part of multinomial model analyses to defend the assumption. If parameter homogeneity is found to be violated, a new family of models, termed latent-class multinomial processing tree models, can be applied that accomodates parameter heterogeneity and correlated parameters, yet preserves most of the advantages of the traditional multinomial method. Estimation, goodness-of-fit tests and tests of other hypotheses of interest are considered for the new family of models. Hierarchical multinomial models 3 In the last two decades, multinomial processing tree models, henceforth referred to as multinomial models, have been extensively used in many areas of psychology; an overview is given by Batchelder and Riefer (1999). They are usually tailored to a particular experimental task and can be used to test assumptions about the psychological processes that contribute to task performance and to assess the relative contribution of each process in a principled manner. Their increasing popularity is due to a number of advantages: Assumptions about psychological processes in a given experimental paradigm can often be cast into the form of a processing tree model in a natural manner. In addition, parameter estimation and hypotheses tests can be conducted by means of relatively straightforward maximum-likelihood techniques (Hu & Batchelder, 1994). Last but not least, multinomial models are often found to describe empirical data well. To introduce the question of the present paper, consider a minimal multinomial model of the standard recognition task. A list of items is presented, and in a subsequent recognition test, these items are shown along with new items. For each item, participants are asked to decide whether the item is an old one, that was previously seen, or a new one. Ignoring the possibility of guessing, assume that participant t correctly recognizes an old item as old with probability Dt, and that the probability of nt correct “old-”responses is given by the binomial distribution with parameters Dt and N , the fixed number of old items presented for recognition. In multinomial modeling, data from several, say T , participants are usually obtained, and the assumption of parameter homogeneity is made. That is, it is assumed that the different memory parameters are equal: D1 = D2 = . . . = DT =: μ ′. This implies that the sum score n+ = ∑T t=1 nt also follows a binomial distribution with expected value TNμ′ and variance TNμ′(1− μ′), defining a minimal multinomial model. The present paper deals with the possibility that there is parameter heterogeneity, that is, in the present example that there are individual differences Hierarchical multinomial models 4 in memory performance. What are the consequences of such differences for multinomial model analyses? Following previous work on parameter heterogeneity (e.g., Riefer & Batchelder, 1991), assume that the participants are sampled from a population of potential participants in which the individual D−parameters are distributed according to a beta distribution with parameters α and β, both of them real numbers larger than zero. The beta distribution is a family of distributions on the interval (0,1) that is relatively tractable and accomodates a wide range of different distributions. Under this assumption, the so-called beta-binomial distribution, rather than the binomial distribution, governs the number of “old”-responses produced by a participant in the sample (Johnson, Kotz, & Kemp, 1993; chap. 2). It is instructive to reparameterize the beta distribution and the beta-binomial distribution by means of the two parameters μ and γ as follows: μ = α α+β , γ = 1 1+α+β . Both parameters take on values between zero and one. The parameter μ is the expected value of the beta distribution, that is, the population mean of parameter D. The variance of D is μ(1− μ)γ. The parameter γ quantifies the extent of heterogeneity. In multinomial model analyses, possible individual differences in D are ignored, and it is assumed that the aggregated data n+ follow a binomial distribution with parameters TN and μ′. Under this assumption, the maximum likelihood estimate of μ′ is D̂ = 1 TN n+. If parameter homogeneity is violated, D̂ is nevertheless an unbiased and consistent estimator of μ, the population average of D. Erdfelder (2000; chap. 5) has shown that parameter estimates will be consistent estimates of the population averages of the individuals’ parameters for a certain subclass of multinomial models that he calls aggregation-invariant multinomial processing tree models, if, in addition, the model parameters are not correlated over persons. For more complex models, parameter estimation will in general be biased as a consequence of parameter heterogeneity so that maximum likelihood Hierarchical multinomial models 5 estimates diverge systematically from the population averages of the individuals’ parameters (Riefer & Batchelder, 1991); see the more formal analysis of the consequences of parameter heterogeneity presented in Appendix A. For the time being, consider the variance of the estimate D̂. Given parameter homogeneity, it is 1 TN μ(1− μ), estimated by 1 TN D̂(1− D̂), because n+ follows a binomial distribution. In contrast, under parameter heterogeneity, the variance is 1 TN μ(1− μ){1 + (N − 1)γ}. The actual variance is thus underestimated by a factor of 1 + (N − 1)γ when parameter heterogeneity is ignored. Given parameter heterogeneity, it is larger, by a factor of maximally N , than is to be expected on the basis of the binomial distribution. If this overdispersion is ignored, and the too small binomial-distribution variance is used, standard errors and confidence intervals of model parameters will be estimated too small, and significance tests for differences between parameter values and aggregated frequencies will exhibit α−errors above the nominal significance level as a consequence. These problems increase in severity as the level of heterogeneity (γ) increases and the numbers of data points (N) collected per participant increase as is readily apparent from the above example (the actual variance of the parameter estimate is underestimated by a factor of 1 + (N − 1)γ). In the general case, such problems will be further exacerbated as the number of participants is increased (see Appendix A). Furthermore, goodness-of-fit tests of multinomial models can be seen as tests of equality restrictions imposed upon the parameters of saturated models (Batchelder & Riefer, 1999). Given parameter heterogeneity, such goodness-of-fit tests will therefore also exhibit inflated levels of α−errors and will often lead to the rejection of simple models, even if these adequately describe each individual’s data, especially in cases where the number of data points collected per person is relatively large. In applications, this outcome often prompts researchers to work with more complex models involving more parameters to fit the aggregated data. Hierarchical multinomial models 6 Not unfrequently, a saturated model is even used that describes the aggregated data perfectly. This strategy ensures that parameter heterogeneity will remain undetected and has the potential to distort the substantive conclusions drawn from subsequent analyses based on the fitting model. In what follows, an extension of the multinomial method termed latent-class multinomial processing tree models will be discussed. Tests for parameter homogeneity will be derived that can be routinely applied in the course of traditional multinomial model analyses and that allow researchers to defend the assumption of parameter homogeneity. If parameter homogeneity is violated on the other hand, the latent-class extension provides a tractable method of applying multinomial models in a way that accomodates parameter heterogeneity, including correlated parameters, yet preserves most of the advantages of the multinomial modeling technique. Whereas the present paper is focused on interindividual differences between participants, items as a source of variability and dependencies are considered in the General Discussion. Before moving on to these topics, let us briefly turn to the so-called pair-clustering model that is used as a running example in this paper. Example 1: The Pair-Clustering Model The pair-clustering model is one of the best analyzed members of the family of multinomial models (e.g., Batchelder & Riefer, 1986, 1999; Riefer & Batchelder, 1991). It is based on a free-recall task in which participants are presented with a list of words that are related by categories. The items consist of several categorically related word pairs (e.g., oxygen and hydrogen), plus a number of singleton words. Word pairs and singletons are presented one word at a time, and participants are later asked to recall the list items in any order. The recall events are scored into mutually exclusive response categories. For the word pairs, four categories, C11, C12, C13, and C14, are distinguished: Hierarchical multinomial models 7 • C11: Both words are recalled adjacently, • C12: both words are recalled, but not adjacently, • C13: only one word in the pair is recalled, and • C14: neither word in the pair is recalled. The recall of singletons is scored into two categories C21: “The singleton is recalled” and C22: “The singleton is not recalled”. The data are the counts nkj with which each response category Ckj is observed, aggregated over participants and the N1 word pairs and N2 singletons in the list. The model is based on four parameters that are the probabilities of storing a word pair as a cluster (c), the probability of a successful retrieval of a stored cluster (r), the probability of successful retrieval of a member of a word pair not stored as a cluster (u) and the probability of the successful retrieval of a singleton (a). Figure 1 shows the processing tree representation of the model. Word pairs are stored as a cluster with probability c. A stored cluster is retrieved with probability r in which case both words are recalled adjacently (response category C11). If a stored cluster cannot be retrieved with probability 1− r, neither word of the word pair is retrieved (response category C14). Thus, it is assumed that clustered items are accessible either as a word pair or not at all. The model equations are: p(C11) = cr p(C12) = (1− c)u p(C13) = (1− c)2u(1− u) (1) p(C14) = c(1− r) + (1− c)(1− u) p(C21) = a p(C22) = 1− a Hierarchical multinomial models 8 Frequently, a restricted model with u = a is used. Latent-Class Multinomial Processing Tree Models A natural approach to accomodate parameter heterogeneity is to consider the model parameters as random rather than fixed effects. For this purpose, a core multinomial model will be defined as the model that describes a given participant’s data with potentially different parameters for each person. This can be extended to a hierarchical multinomial model by specifying a distribution of the parameters to model parameter heterogeneity (Raudenbush & Bryk, 2002). Multinomial processing tree models are models for response frequencies of pre-defined mutually exclusive response categories. In most cases, there are several independent category systems. For example, in the pair-clustering paradigm, responses are observed to two kinds of items, word pairs and singletons, that are scored into two category systems with four and two response categories, respectively. The separate category systems are modelled by separate subtrees of the multinomial model. Person t contributes frequency counts nkjt, where k = 1, . . . , K runs over category systems, or equivalently subtrees, and j = 1, . . . , Jk runs over the Jk response categories of category system k. For each category system k, these frequencies are assumed to follow a multinomial distribution with parameters pkjt, j = 1, . . . , Jk, and Nk, the fixed number of responses obtained per person t and category system k. A multinomial processing tree model consists of a description of the category probabilities pkjt by means of S functionally independent parameters θst, s = 1, . . . , S (each θst being free to vary in (0, 1)): pkjt = pkj(θt), where θt is the vector of the S parameter values by person t. The functions pkj(θ) have a simple form (e.g., Equation 1) that permits the application of a simple EM-algorithm for maximum-likelihood estimation of the model parameters (Hu & Batchelder, 1994). In the traditional analysis, it is assumed that the θt are equal over persons, and the response-category frequencies aggregated over persons are then sufficient statistics Hierarchical multinomial models 9 for the model parameters. Allowing for different parameters for each person t, the vector of person-wise category counts nt = (n11t, . . . , n1J1t, . . . , nK1t, . . . , nKJKt) ′ is still modelled by a vector-valued random variable N that follows a product-multinomial distribution:
منابع مشابه
The Analysis of Bayesian Probit Regression of Binary and Polychotomous Response Data
The goal of this study is to introduce a statistical method regarding the analysis of specific latent data for regression analysis of the discrete data and to build a relation between a probit regression model (related to the discrete response) and normal linear regression model (related to the latent data of continuous response). This method provides precise inferences on binary and multinomia...
متن کاملTreeBUGS: An R package for hierarchical multinomial-processing-tree modeling
Multinomial processing tree (MPT) models are a class of measurement models that account for categorical data by assuming a finite number of underlying cognitive processes. Traditionally, data are aggregated across participants and analyzed under the assumption of independently and identically distributed observations. Hierarchical Bayesian extensions of MPT models explicitly account for partici...
متن کاملParametric Discrete Choice Models Based on the Scale Mixtures of Multivariate Normal Distributions (running Title: Parametric Discrete Choice Models)
SUMMARY. A rich class of parametric models is proposed for discrete choice data based on the scale mixtures of multivariate normal distributions. With special connections to multinomial probit, the new models can be implemented in a Bayesian framework without much diiculty. The proposed class of models can be extended to panel data where accounting for heterogeneities is needed. This is done by...
متن کاملBayesian time series classification
This paper proposes an approach to classification of adjacent segments of a time series as being either of classes. We use a hierarchical model that consists of a feature extraction stage and a generative classifier which is built on top of these features. Such two stage approaches are often used in signal and image processing. The novel part of our work is that we link these stages probabilist...
متن کاملImproving Classification Models When a Class Hierarchy Is Available
Improving classification models when a class hierarchy is available Babak Shahbaba Doctor of Philosophy Graduate Department of Public Health Sciences University of Toronto 2007 We introduce a new method for modeling hierarchical classes, when we have prior knowledge of how these classes can be arranged in a hierarchy. The application of this approach is discussed for linear models, as well as n...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005